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Abstract 

This paper presents sufficient conditions for the existence of stationary optimal policies for average- 
cost Markov Decision Processes with Borel state and action sets and with weakly continuous transition 
probabilities. The one-step cost functions may be unbounded, and action sets may be noncompact. The 
main contributions of this paper are: (i) general sufficient conditions for the existence of stationary 
discount-optimal and average-cost optimal policies and descriptions of properties of value functions and 
sets of optimal actions, (ii) a sufficient condition for the average-cost optimality of a stationary policy in 
the form of optimality inequalities, and (iii) approximations of average-cost optimal actions by discount- 
optimal actions. 

1 Introduction 

This paper provides sufficient conditions for tlie existence of stationary optimal policies for 
average-cost Markov Decision Processes (MDPs) with Borel state and action sets and with weakly 
continuous transition probabilities. The cost functions may be unbounded and action sets may 
be noncompact. The main contributions of this paper are: (i) general sufficient conditions for 
the existence of stationary discount-optimal and average-cost optimal policies and descriptions of 
properties of value functions and sets of optimal actions (Theorems 13. 1[ 15.21 and 15.61 ). (ii) a new 
sufficient condition of average-cost optimality based on optimality inequalities (Theorem 14. II) . and 
(iii) approximations of average-cost optimal actions by discount-optimal actions (Theorem 16. II) . 

For infinite-horizon MDPs there are two major criteria: average costs per unit time and expected 
total discounted costs. The former is typically more difficult to analyze. The so-called vanishing 
discount factor approach is often used to approximate average costs per unit time by normalized 
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expected total discounted costs. The literature on average-cost MDPs is vast. Most of the earlier 
results are surveyed in Arapostathis et al. [[Hi. Here we mention just a few references. 

For finite state and action sets, Derman [[TOll proved the existence of stationary average-cost 
optimal policies. This result follows from Blackwell [[H and it also was independently proved by 
Viskov and Shiryaev [[29ll . When either the state set or the action set is infinite, even e-optimal 
policies may not exist for some e > 0; Ross [[231 , Dynkin and Yushkevich [[TTl Chapter 7], Fein- 
berg [1121 Section 5]. For a finite state set and compact action sets, optimal policies may not exist; 
Bather [d, Chitashvili [[1, Dynkin and Yushkevich [[HI Chapter 7]. 

For MDP with finite state and action sets, there exist stationary policies satisfying optimality 
equations (see Dynkin and Yushkevich [[TTl Chapter 7], where these equations are called canoni- 
cal), and, furthermore, any stationary policy satisfying optimality equations is optimal. The latter 
is also true for MDPs with Borel state and an action sets, if the value and weight (also called bias) 
functions are bounded; Dynkin and Yushkevich |[TT1 Chapter 7] . When the optimal value of average 
costs per unit time does not depend on the initial state (the optimal value function is constant), the 
pair of optimality equations becomes a single equation. For bounded one-step costs, Taylor [[28l . 
Ross [[2TII for a countable state space and Ross ll22l . Gubenko and Statland [[TSl for a Borel state 
space provided sufficient conditions for the validity of optimality equations with a bounded bias 
function; see also Dynkin and Yushkevich [[TTl Chapter 7]. Under all known sufficient conditions 
for the existence of average-cost optimal policies for infinite-state MDPs, the value function is 
constant. 

In many applications of infinite-state MDPs, one-step costs are unbounded from above. For 
example, holding costs may be unbounded in queueing and inventory systems. Sennott |[25l [26l 
(and references therein) developed a theory for countable- state problems with unbounded one- 
step costs. For unbounded costs, optimality inequalities are used instead of optimality equations 
to construct a stationary average-cost optimal policy. Cavazos-Cadena [|7|[ provided an example, 
when optimality inequalities hold while optimality equations do not. 

Schal |[24l developed a theory for Borel state spaces and compact action sets. Two types of 
continuity assumptions for transition probabilities are considered in Schal [[24|[: the setwise and 
weak continuity. For a countable state space these assumptions coincide; see Chen and Fein- 
berg [[HI Appendix]. Setwise convergence of probability measures is stronger than weak conver- 
gence; Hernandez-Lerma and Lasserre ifTTl p. 186]. Formally speaking, the setwise continuity 
assumption for MDPs is not stronger than the weak continuity assumption, since the former claims 
that the transition probabilities are continuous in actions, while they are jointly continuous in states 
and actions in the latter. However, the joint continuity of transition probabilities in states and ac- 
tions often holds in applications. For example, for inventory control problems with uncountable 
state spaces, setwise continuity of transition probabilities takes place if demand is a continuous 
random variable, while weak continuity holds for arbitrarily distributed demand; see Feinberg and 
Lewis [[T4l Section 4]. The importance of weak convergence for practical applications is mentioned 
in Hernandez-Lerma and Lasserre ifTSl p. 141]. 
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In many applications action sets are not compact. Hernandez-Lerma [[T6l extended Schal's ll24l 
results under the setwise continuity assumptions to possibly noncompact action sets. Schal's [[24| 
assumptions on compactness of action sets and lower semi-continuity of cost functions in the 
action argument are replaced in Hemandez-Lerma [[T6ll by a more general assumption, namely, 
that the cost functions are inf-compact in the action argument. For weakly continuous transition 
probabilities and possibly noncompact action sets, Feinberg and Lewis lfT4l proved the existence of 
stationary optimal policies for MDPs with cost functions being inf-compact in both state and action 
arguments when, in addition to Schal's ll24l boundness assumption on the relative discounted value 
at each state, the so-called local boundness condition was assumed. 

The original goal of this study was to show that the results from Feinberg and Lewis [fT4l hold 
without local boundness condition. However, the results of this paper are more general. This 
paper provides a weaker boundness condition on the relative discounted value (Assumption (B) 
in Section [S]) than Assumption (B) introduced in Schal [|24|. It also provides a more general and 
natural assumption (Assumption (W*) in Section [3]) than inf-compactness of the one-step cost 
function in both arguments. The main result of this paper. Theorem 15. 21 establishes the validity of 
optimality inequalities and the existence of stationary optimal policies under Assumptions (W*) 
and (B). 

While inf-compactness of the cost function in the action parameter is a natural assumption, 
inf-compactness in the state argument is a more restrictive condition. For example, when the state 
space is unbounded (e.g., the set of nonnegative numbers) and action sets are compact, the assump- 
tion, that the cost function is inf-compact in both arguments, does not cover the case of bounded 
costs functions studied by Ross [|22|. Gubenko and Shtatland |[T5l . and Dynkin and Yushkevich fTTl 
Chapter 7]. Assumption (W*) covers this case as well as unbounded costs and noncompact action 
sets. 

As follows from the example presented in Luque-Vasquez and Hernandez-Lerma (1995), 
MDPs with lower-semicontinuous cost functions may possess pathological properties, even if the 
one-step cost function is inf-compact in the action variable. Assumption (W*)(ii) removes this 
difficulty. As stated in Lemma [X2l this assumption is weaker than Schal's [[24| compactness and 
continuity assumptions for weakly continuous transition probabilities and than inf-compactness of 
one-step cost functions in both arguments (state and action) assumed in Feinberg and Lewis lfT4ll . 

2 Model Description 

For a metric space S, let B{S) be a Borel ci-field on 5*, that is, the a-field generated by all open 
sets of metric space S. For a set C S, we denote by B{E) the cr-field whose elements are 
intersections of E with elements of B{S). Observe that E isa metric space with the same metric as 
on S, and B{E) is its Borel cr-field. For a metric space S, we denote by F(S) the set of probability 
measures on {S, B{S)). A sequence of probability measures {/U„} from F(S) converges weakly to 
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fi G P(S') if for any bounded continuous function f on S 

/ f{s)fJ'n{ds)^ / f{s)fi{ds) asn— )-oo. 
Js Js 

Consider a discrete-time MDP with a state space X, an action space A, one-step costs c, and 
transition pobabilities q. Assume that X and A are Borel subsets of Polish (complete separable 
metric) spaces with the corresponding metrics p and 7. For all a; G X a nonempty Borel subset 
A{x) of A represents the set of actions available at x. Define the graph of A by 

Gt{A) = {(x,a) : X G X,a G A{x)}. 

Assume also that 

(i) Gt{A) is a measurable subset of X x A, that is, Gt{A) g B{Gt{A)), where B{Gt{A)) = 
i3(X) ®i3(A); 

(ii) there exists a measurable mapping : X — A such that G A{x) for all x G X; 

The one step cost, c{x, a) < +00, for choosing an action a G A{x) in a state x G X, is a 
bounded below measurable function on Gi{A). Let q{B\x, a) be the transition kernel representing 
the probability that the next state is in i? G -B(X), given that the action a is chosen in the state x. 
This means that: 

• q{-\x, a) is a probability measure on (X, i3(X)) for all (x, a) G X x A; 

• q{B\-, ■) is a Borel function on (Gr(A), i3(Gr(A))) for all B G i3(X). 
r/ze decision process proceeds as follows: 

• at each time epoch n = 0, 1, ... the current state x G X is observed; 

• a decision-maker chooses an action a G A{x)] 

• the cost c(x, a) is incurred; 

• the system moves to the next state according to the probability law g(-|x, a). 

As explained in the text following the proof of Lemma l3.3[ if for each x G X there exists a G A{x) 
with c(x, a) < 00, the measurability of Gi{A) and inf-compactness of the cost function c in the 
action variable a assumed later imply that assumption (ii) holds. 

Let H„ = (X X A)" X X be the set of histories by time n = 0, 1, ... and fi(H„) = (S(X) ® 
B{A))'^®B{X). A randomized decision rule at epoch n = 0, 1, ... is a regular transition probability 
-Kn '■ Hn — >■ A concentrated on that is, (i) 7r„(- | is a probability on (A, B{A)), given the 

history /i„ = (^0, 6, ^^i, ^^n-i, 6«) e H„, satisfying 7r„(A(^„)|/i„) = 1, and (ii) for all 
B G B{A), the function 7r.„(fi|-) is Borel on (E[„, B(E[„)). A po/zcy is a sequence vr — {7r„}„=o,i,... 
of decision rules. Moreover, tt is called nonrandomized, if each probability measure 7r„(-|/i„) is 
concentrated at one point. A nonrandomized policy is called Markov, if all of the decisions depend 
on the current state and time only. A Markov policy is called stationary, if all the decisions depend 
on the current state only. Thus, a Markov policy is defined by a sequence 00, 0i, • • • of Borel 
mappings 0„ : X — A such that 0„(x) G A(x) for all x G X. A stationary policy is defined by a 
Borel mapping : X — )• A such that 0(x) G A(x) for all x G X. Let 

F = {0 : X A : is Borel and 0(x) G A{x) for all x G X} 



4 



be the set of stationary policies. 

The lonescu Tulcea theorem (Bertsekas and Shreve flU pp. 140-141] or Hemandez-Lerma and 
Lassere IfTTl p. 178]) implies that an initial state x and a policy vr define a unique probability on 
the set of all trajectories Hoo = (X x A)°° endowed with the product of cr-field defined by Borel 
cr-field of X and A. Let be an expectation with respect to P^. 

For a finite horizon = 0, 1, let us define the expected total discounted costs 

N-l 

^]^,a:=E:X;«Men,Wn), XGX, (2.1) 

n=0 

where a > is the discount factor and vl^{x) = 0. When a = 1, we shall write v'^{x) instead of 
vjfi{x). When = oo and a G [0, 1), (|2.1I) defines an infinite horizon expected total discounted 
cost denoted by vl^{x). 

The average cost per unit time is defined as 

w'^{x) := limsup —v'^{x), x G X. (2.2) 

For any function g^{x), including g'^{x) = vj^ ^{x), g'^{x) = f^(x), and g^{x) = w'^{x), define 
the optimal cost 

g{x) := mig'^{x), x G X, 

vrSll 

where 11 is the set of all policies. 

A policy TV is called optimal for the respective criterion, if g'^{x) = g{x) for all x G X. For 
g"^ = f^^, the optimal policy is called n-horizon discount-optimal; for g'^ = v'^, it is called 
discount-optimal; for g'^ = w"^, it is called average-cost optimal. 

It is well known (see, e.g., Bertsekas and Shreve JH Proposition 8.2]) that the functions f„,a(x) 
recursively satisfy the following optimality equations with vo^a{x) = for all x G X, 



Vn+i,aix) = mi <c{x,a) + a Vn,aiy)q{dy\x,a)}, x G X, n = 0, 1, ... . (2.3) 

In addition, a Markov policy (p, defined at the first steps by the mappings </)o, ...(j)N~i, that satisfy 
for all n = 1, the equations 

Vn,a{x) = c{x,(j)N-n{x)) + a Vn^l,a{y)q{dy\x,(j)N-n{x)), X G X, (2.4) 

Jx 

is optimal for the horizon A^; see e.g. Bertsekas and Shreve Lemma 8.7]. 

It is also well known (Bertsekas and Shreve [4, Propositions 9.8 and 9.12]) that v^, where 
a G (0, 1], satisfies the following discounted cost optimality equation (DCOE): 



Vr.[X 



inf < c(x, a) + a / Va{y)q{dy\x, a) > , x G X, (2.5) 
and a stationary policy (f)a is discount-optimal if and only if 

Vo,{x) = c{x,(j)a{x)) + a Va{y)q{dy\x,(j)a{x)), x G X. (2.6) 
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3 General Assumptions and Auxiliary Results 

Following Schal flMl, consider the following assumption. 
Assumption (G). w* := inf w{x) < +00. 

This assumption is equivalent to the existence of x G X and ir E H with w'^{x) < 00. If 
Assumption (G) does not hold then the problem is trivial, because w{x) = 00 for all x G X and 
any policy tt is average-cost optimal. Define the following quantities for a G [0, 1): 

rria = inf Va(x), Ua(x) = Va(x) — rria, 

w = liminf(l — a)ma, w = limsup(l — a)ma. 

Observe that Mq,(x) > for all x G X. According to Schal [|24l Lemma 1.2], Assumption (G) 
implies 

< w<w <w* < +00. (3.1) 

According to Schal [|24l Proposition 1.3], under Assumption (G), if there exists a measurable 
function m : X — )• [0, +00) and a stationary policy such that 

w + u{x)>c{x,(f){x))+ / u{y)q{dy\x, (j){x)), x G X, (3.2) 

Jx 

then (f) is average-cost optimal and w{x) = w* = w_ = w for all x G X. Here need a different form 
of such a statement. 

Theorem 3.1. Let Assumption (G) hold. If there exists a measurable function u : X — )■ [0, +00) 
and a stationary policy (p such that 



w + ■u(x) > c(x, 0(x)) + / u{y)q{dy\x,(t){x)), x G X, (3.3) 

then (j) is average-cost optimal and 

w{x) = w'^{x) = limsup(l — a)va{x) =w = w* , x G X. (3.4) 

Proof. Similarly to Hemandez-Lerma |fT6l p. 239] or Schal Il24l Proposition 1.3], since u is non- 
negative, by iterating (|3.3I) we obtain 

nw + u{x) > f^(x), n > 1, X G X. 

Therefore, after dividing the last inequality by n and setting n — )• 00, we have 

W > w^{x) > w{x) >w*, X G X, (3.5) 

where the second and the third inequalities follow from the definitions of w and w* respectively. 
Since w > w*, inequalities (13.11) imply that for all tt G 11 

w* =w < limsup(l — a)va{x) < limsup(l — a)f^(x) < w'^(x), vr G 11, x G X. 
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Finally, we obtain that 



w* = w < limsup(l — a)va{x) < inf w'^{x) = w{x) < w'''{x) < w, 



X eX 



(3.6) 



where the last inequality follows from (I3.5I ). Thus all the inequalities in (13.61) are equalities. □ 

Let us set M = [—00, +00), = [0, 00), and M = M U {+00}. For an M-valued function /, 
defined on a Borel subset f/ of a Polish space Y, consider the level sets 



— 00 < A < +00. We recall that the function / is lower semi-continuous on U if all the level sets 
Vf{X) are closed and the function is inf-compact on U if all these sets are compact. The level sets 
Vf{\) satisfy the following properties that are used in this paper: 

(a) if Ai > AthenI?j(A) C I?^(Ai); 

(b) if g, f are functions on U satisfying g{y) > f{y) for ally e U then ^'^(A) ^'Df{\). 

A set is called cr-compact if it is a union of a countable number of compact sets. Denote by 
K{A) the. family of all nonempty compact subsets of A and by Kcr{A) family of all a-compact 
subsets of A; K{A) C K„{A). Also denote by S{A) the set of nonempty subsets of A. 

A set-valued mapping F : X — )• S{A) is upper semi-continuous at x G X if, for any neigh- 
borhood G of the set F{x), there is a neighborhood of x, say U{x), such that F{y) C G for all 
y G U{x) (see e.g., Berge E p. 109] or Zgurovsky et al. |[30l Chapter 1, p. 7]). A set-valued 
mapping is called upper semi-continuous, if it is upper semi-continuous at all x G X. 

For weakly continuous transition probabilities, the following basic assumptions were consid- 
ered in Schal [El. 

Assumption (W). 

(i) c is lower semi-continuous and bounded below on Gi{A)\ 

(ii) A{x) G K{K) for x G X and A : X — )■ K{A) is upper semi-continuous; 

(iii) the transition probability g(-|x, a) is weakly continuous in (x, a) G Gi{A). 
Weak continuity of q in (x, a) means that 



for any sequence {(x^, ak),k > 0} converging to (x, a), where (xfc, ak), (x, a) G Gt{A), and for 
any bounded continuous function / : X — t- M. We notice that there is an additional assumption in 
Schal |[24l . namely, that X is a locally compact space with countable base. However, as follows 
from this paper, the assumption is not necessary here as well as in Feinberg and Lewis lfT4l . since 
there exists at least one stationary policy. We also remark that the assumptions in (W) were 
presented in a different order here than in Schal [l24|. and that it is assumed in Schal [[24| that c 
is nonnegative. Since for discounted and average cost criteria the cost function can be shifted by 



Vf{\) = {yeU : f{y) < A}, 



(3.7) 
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adding any constant, the boundness and nonnegativity of c are equivalent assumptions. We consider 
Assumption (Wu) from Feinberg and Lewis [[Ml without assuming that X is locally compact. 
Assumption (Wu). 

(i) c is inf-compact on Gr{A); 

(ii) Assumption (W)(iii) holds. 
Assumption (W*). 

(i) Assumption (W)(i) holds; 

(ii) if a sequence {xn}n=i,2,... with values in X converges and its limit x belongs to X then any 
sequence {a„}„=i,2,... with a.„ G A{xn), n = 1,2,..., satisfying the condition that the sequence 
{c{xn, an)}n=i,2,... Is boundcd above, has a limit point a G A{x); 

(iii) Assumption (W)(iii) holds. 

Lemma 3.2. The following statements hold: 

(i) Assumption (Wj implies Assumption (W*); 

(ii) Assumption (Wuj implies Assumption (W*j. 

Proof, (i) Let x„ — > x as n — oo, where x G X and x„ G X, n = 1, . . . . We show that 
under Assumption (W)(ii) any sequence {an}n=i,2,... with a„ G A{xn) has a limit point a G 
A{x). Indeed, since /C := (U„>i{x„}) U {x} is a compact set and set- valued mapping A : X — )• 
K{A) is upper semi-continuous, then Berge [[3l Theorem 3 on p. 110] implies that the image 
A{}C) is also compact. As {a„}n>i C A{IC) then the sequence {an}n>i has a limit point a G A. 
Consider a sequence n/,. — t- oo such that a.„^, — )■ a. Since ^(-z) G K(A) for all 2; G X, the 
upper-semicontinuous set- valued mapping A is closed and, since A is closed, a G v4(x); Berge [[31 
Theorems 5 and 6 on pp. Ill, 112]. 

(ii) Since c is inf-compact, it is lower-semicontinuous and bounded below. We just need to 
show that Assumption (W*)(ii) holds. Let us consider x„ — x as n — )■ +00 and a„ G A{xn), 
n = 1, , 2, . . . , such that x„, x G X and for some A < 00 the inequality c(x„, a„) < A holds for all 
n = 1, 2, . . . . Then, by inf-compactness of c on Gi{A), the level set I^c(A) is compact. Thus the 
sequence {xn,an}n>i has a limit point (x, a) G ^^clA) C Gr(A). Since (x, a) G Gr{A), we have 
a G A(x). □ 

For any a > and lower semi-continuous nonnegative function n : X — t- M, we consider an 
operation 77", 



Let L(X) be the class of all lower semi-continuous and bounded below functions <^ : X — )■ M 

with dom(y9 := {x G X : ip{x) < +00} 7^ 0. Observe that 77" = ?7^„. 

Lemma 3.3. For any x G X the following statements hold: 

(a) under Assumption ^*(ii), the function c(x, ■) is inf-compact on A{x); 




(3.8) 
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(b) under Assumptions for any u G L{%) and a > 0, the function ri"{x, ■) is inf- 

compact on A{x). 

Proof, (a) For an arbitrary A G M and fixed x G X, consider the set Vf-j^^ .^^X) = {a G A{x) : 
c(x, a) < A}. Assumption W*(ii) means, that this set is compact. Thus, (i) is proved. 

(b) Fix X G X again. Since u G L(^) and q is weakly continuous in a, the second summand in 
(13.81) is a lower semi-continuous function on A{x) (Hernhdez-Lerma and Lasserre [[T7l p. 185]) and 
it is bounded below by the same constant as u. According to statement (i), c(x, ■) is inf-compact on 
A{x). The sum of an inf-compact function and a bounded below lower semi-continuous function 
is an inf-continuous function. □ 

A measurable mapping : X — )• A, such that 0(x) G A{x) for all x G X, is called a selector 
(or a measurable selector). In our case, selectors and decision rules are the same objects. Since 
we identify a stationary policy with a decision rule, selectors and stationary policies are the same 
objects. The existence of selector for the mapping A is the necessary and sufficient condition for 
the existence of a policy. Let E C H x A and proj^ E = {x E X : {x,a) E E for some a E E} 
be a projection of E on X. A Borel map / : proj^ -E — ?■ A is called a Borel uniformization of E, 
if (x, /(x)) G E for all x G proj^ E. Let E^ = {a : (x, a) G -E} be a cut of at x G X. 

Arsenin-Kunugui Theorem (Kechris |[T9l p. 297]) If E is a Borel subset o/X x A and E^ E 

Kfj{A) for all X eX then there exists a Borel uniformization ofE and proj^ E is a Borel set. 

We remark that it is assumed in Kechris [[T9l p. 297]) that X is a standard Borel space (that is, 
isomorphic to a Borel subset of a Polish space) and A is a Polish space. Here X and A are Borel 
subsets of Polish spaces. These two formulations are obviously equivalent. 

We recall that Gi{A) is assumed to be Borel and A{x) 7^ 0, x G X. With E = Gi{A), Arsenin- 
Kunugui Theorem implies the existence of a stationary policy under the assumption A(x) E K(A) , 
X G X. Thus, Assumption (W) implies the existence of a policy for the MDP. 

Let Assumption (W*) hold. Set F(x) = {a E A(x) : c(x, a) < 00}, x G X. In view 
of Lemma[331 F{x) = l^ne{i,2,...}^c{x,-){^) e K^{A). In addition, Gr(F) = {(x,a) G Gr{A) : 
c(x, a) < 00} is a Borel subset of X x A. Thus, if the function c takes only finite values, a stationary 
policy exists in view of Arsenin-Kunugui Theorem. 

Of course, if it is possible that c(x, a) = 00, a uniformization may not exist. For example, 
this takes place when c(x, a) = 00 for all (x, a) E Gt{A) and Gi{A) does not have a measurable 
selector. However c(x, a) = 00 means from a modeling prospective that this state-action pair 
should be excluded, because selecting a in x leads to the worst possible result. If there are state- 
action pairs (x, a) with c(x, a) = 00 and Gi{A) does not have a uniformization, the MDP can be 
transformed into an MDP modeling the same problem and with a nonempty set of policies. Let 
us exclude the situation when c(x,a) = 00 for all (x, a) G Gr(y4), because it is trivial: all the 
actions are bad. Define X = proj^ Gr(F) and F = X \ X. Under Assumption (W*), Arsenin- 
Kunigui Theorem implies that X is Borel and there exist a Borel mapping / from X to A such that 
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/(x) G F{x) for all X G X. If F = (that is, there exists an action a G A{x) with c(x, a) < oo 
for each x G X) then = / is a stationary policy. 

Let us consider the situation when F 7^ 0. In such an MDP, as soon as the state is in Y, the 
losses are infinite and there is no reason to model the process after this. Let us transform the model 
by choosing any x* E Y and any a* E A and setting the new state set X* = X U {x*}, keeping 
the original action set A, setting new action sets A*(x) = F{x) for x E X and A*(x*) = {a*}, 
defining the new cost function 



c X, a 



c(x, a), if X G F and a E F{x) 
00, if X = X* and a = a* . 



and considering new transition probabilities defined for x G X* and a E A*{x) by 

{g(5|x,a), if 5 C X, 5 G i3(X), and X G X, 
g(F|x,a), if 5 = {x*}, andx G X, 
1, li B = {x*} and x = x*. 

The new MDP is nontrivial in the sense that the set of policies is not empty. Finding an optimal 
policy for this MDP is equivalent to finding a policy for the original MDP until its first exit time 
from X, and in both cases the process incurs infinite losses, if it leaves X. So, the original and the 
new MDP model are the same problem. 

Lemma 3.4. If Assumption (W*j holds andu E L{X), then the function 

u*{x) := inf \c{x,a) + / u{y)q{dy\x,a)], x G X, (3.9) 
belongs to -L(X), and there exists f e¥ such that 

u*{x) = c(x, fix)) + / u{y)q{dy\x, /(x)), x G X. (3.10) 
Moreover, infimum in t\3.9\) can be replaced by minimum, and the nonempty sets 

A^:{x) = < a E A{x) : u* {x) = c{x , a) + I u{y)q{dy\x,a)\ , x G X, (3.11) 



satisfy the following properties: 

(a) the graph Gt{A^) = {(x, a) : x G X, a G A^{x)} is a Borel subset o/X x A; 

(b) ifu*{x) = +00, then A^:{x) = A{x), and, ifu*{x) < +00, then A^:{x) is compact. 

Proof. Under Assumption (W*), for any lower semi-continuous on X, bounded below function 
M : X — )■ M and a E (0, 1], the function 77"^^ ^ is inf-compact on A{x), x G X. This follows from 
Lemma |331 Thus, infimum in (|3.9I) can be replaced by minimum and A*{x) is nonempty for any 
X EX. 
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Now we show that u* is lower semi-continuous on X. Let us fix an arbitrary x G X and any 
sequence x„ — > x as n -> +00. We need to prove the inequality 

< liminf (3.12) 

n— )-+oo 



If liminf = +00, then (13.121) obviously holds. Thus we consider the case, when 

liminf u*{xn) < +00. There exists a subsequence {x„j^}fe>i C {x„}„>i such that 



n— >+oo 



liminf = lim 

Setting A = lim u*{xn^) + 1, we get the inequality u*{xn^) < A for all k > K, where K is some 

fc— >+oo 

natural number. Since the function rjl is inf-compact on Gi{A), equation (13.91) can be rewritten as 

u*(x) := min ?7^(x,a), x G X. 

Thus, for any k > K there exists ak G A(x„^,) such that m*(x„j.) = f]ui^nk^ '^fc)- Therefore, 

c{xnk,ak) < r]l{xnk,ak) < A, /c > 

In view of Assumption (W*)(ii), there exists a convergent subsequence {ak^}m>i of the sequence 
{afc}fc>i such that Ofc^ — a G A(x) as m +00. Due to lower semi-continuity of rjl on Gt{A), 

liminf = lim M*(x„J = lim ) = lim r/^(x„ a^^) > a) > m*(x). 

n— >+oo m— i-+oo m— )-+oo 

Inequality (|3.12l) holds. Thus, u* is lower semi-continuous on X. 

Now we consider the nonempty sets A^{x), x G X, defined in (13.1 II) . The graph Gr(A^) is a 
Borel subset of X x A, because Gi{A^) = {(x, a) : n*(x) = rj^x, a)}, and the functions rjl and 
n* are lower semi-continuous on Gt{A) and X respectively, and therefore they are Borel. 

We remark that, if u* = +00, then A^{x) = A{x). If u*{x) < 00, then Lemma [331 implies that 
the set At(x) is compact. Indeed, fix any x G X/ := {x G X : u*{x) < 00} and set A = u*{x). 
Then the set A^{x) = {a e A{x) : r]\{x,a) < A} = P^i(a, .)(A) is compact, because ?7u(x, ■) is 
inf-compact on A{x). 

Let us prove the existence of / G F satisfying (|3.10l) . Since the function u* is lower- 
semicontinuous, it is Borel and the sets X^o := {x G X : u*{x) = +00} and Xj are Borel. 
Therefore, the graph of the mapping Xf — A^ is the Borel set Gt{A^) \ (Xoc x A). Since the 
nonempty sets A^,{x) are compact for all x G X/, the Arsenin-Kunugui Theorem implies the exis- 
tence of a Borel selector /i : X/ — )■ A such that /i(x) G A*(x) for all x G X. Consider any Borel 
mapping /2 from X to A satisfying f2{x) G A{x) for all x G X and set 



/i(x), ifxGX/, 
/2(x), if X G Xoo- 

Then / G F and /(x) G A* (x) for all x G X. □ 



11 



The following Lemma [331 is formulated in Schal ll24l Lemma 2.3(ii)] without proof. Reference 
Serfozo ll27l mentioned in Schal [[24l Lemma 2.3(ii)] contains relevant facts, but it does not contain 
this statement. Therefore we provide the proof. Recall that for a metric space 5", the family of all 
probability measures on (5, B{S)) is denoted by F(S). 

Lemma 3.5. Let S be an arbitrary metric space, {nn}n>i C P(S') converges weakly to n E P(S'), 
and {hn}n>i be a sequence of measurable nonnegative M-valued functions on S. Then 

/ !l{s)fi{ds) < liminf / hn{s)^n{ds), 

Js n^+oo Jg 

where h{s) = liminf /i„,(s'), s G S*. 

Proof. See Appendix A. □ 
We remark that lim inf hn{s') is the least upper bound of the set of all A G M such that there 

n— >+oo, s'—^s 

exist = 1, 2, . . . and a neighborhood U (s) of s such that A < inf{/i„(s') : n > N, s' E U (s)}. 



4 Expected Total Discounted Costs 

In this section, we establish under Assumption (W*) the standard properties of discounted MDPs: 
the existence of stationary optimal policies, description of the sets of stationary optimal policy, 
and convergence of value iterations. Theorem 14. 1 1 strengthens Feinberg and Lewis 041 Proposition 
3.1], where these facts are proved under Assumption (Wu). In terms of applications to inventory 
and queuing control. Assumption (W*) does not require that holding costs increase to infinity as 
the inventory level (or workload, or the number of customers in queue) increases to infinity. 

Theorem 4.1. Let Assumption (W*) hold. Then 

(i) the functions Vn, a, ^ = 1, 2, . . ., andva are lower semi-continuous on X, anJf„ q,(x) t Va{x) 
as n +00 for all x G X; 

(ii) 

Vn+i,a{x) = min \c{x,a) + a / Vn,aiy)qidy\x, a) } , x G X, n = 0,l,..., (4.1) 

a6A(x) [ J 

where Vo^a{x) = for all x G X, and the nonempty sets A^^aix) := {a G A{x) : Vn+i^a{x) = 
?7"^ ^ (x, a)}, X G X, n = 0, 1, ... , satisfy the following properties: (a) the graph Gr(y4„ Q,) = 
{(x, a) : X eX, a E Aa{x)}, n = 0,1, ... , is aBorel subset of ^ x A, and(b) ifvn.+i,a{x) = +oo, 
then An,a{x) = A{x) and, ifvn+i,a{,x) < +oo, then An^a{,x) is compact; 

(Hi) for any N = 1,2,..., there exists a Markov optimal N-horizon policy {(po, . . . , (f>N-i) <^nd 
if, for an N-horizon Markov policy {(f>o, . . . , (pN-i) the inclusions (pN-i-nix) E Aa,n{x), x E X, 
n = 0, . . . , N — 1, hold then this policy is N-horizon optimal; 
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(iv) fora^ [0,1) 

Va{x) = min < c(x, a) + a / Va{y)q{dy\x,a)\ , x G X, (4.2) 

and the nonempty sets Aa{x) := {a G A{x) : Va{x) = 77"^ (x, a)}, x G X, satisfy the following 
properties: (a) the graph Gi{Aa) = {(x, a) : x G X, a G Aa{x)} is a Borel subset ofX x A, and 
(b) ifva{x) = +00, then Aa{x) = A{x) and, ifv^^x) < +00, then ^^(x) is compact. 

(v) for an infinite-horizon there exists a stationary discount-optimal policy 0q,, and a stationary 
policy is optimal if and only if(f)a{x) G Aa{x) for all x G X. 

(vi) (Feinberg and Lewis [[Ml Proposition 3.1(iv)]) under Assumption (Wuj, the functions Vn,a, 
n = 1,2, . . ., and are inf-compact on X. 

Proof, (i)-(v). First, we prove these statements for a nonnegative cost function c. In this case, 
Vn,a{x) > 0, n = 0, 1, . . . , and Va{x) > for all x G X. 

By (12.31) and Lemma [34l t>i Q, G L(X), since vo^a = G L(X). By the same arguments, if 
Vn,a £ L{X) then Vn+i,a ^ L{X). Thus G Iv(X) for all n = 0, 1, . . . .By Lemma [331 
for any n = 1, 2, . . ., x G X, and A G M, the set P^j^ (2.', )(A) is a compact subset of A. By 
Bertsekas and Shreve [[H Proposition 9.17], v^^a t ^'a — )■ +00. Since the limit of a monotone 
increasing sequence of lower semi-continuous functions is again a lower semi-continuous function, 
Va G L(X). Lemma [34l applied to equations (12.31) and (12.51) . implies statements (ii) and (iv) 
respectively. Statement (iii) follows from (12.41) and statement (v) follows from (12.61) . 

Now let c(x, a) > K for all (x, a) G Gr(A) and for some K > —00. For K > 0, statements 
(i)-(v) are proved. For K < 0, consider the value functions c = c — K > 0. If the cost function c 
substituted with c, we substitute the notation v with v. Then t"^ q, = v^^^ + ^j^K, n = 0,1,..., 
for all policies vr. Thus, Vn,a = Vn,a + ^ji^K, n = 0,1, . . . , and = Va + Since statements 
(i)-(v) hold for the shifted costs c and the value functions {i„ „ and v^, they also hold for the initial 
cost function c and the value functions Vn,a and Va- □ 

We remark that the conclusions of Theorem 14.11 and its proof remain correct when a = 1 and 
the function c is nonnegative. 

5 Average Costs Per Unit Time 

In this section we show that Assumption (W*) and boudness assumption Assumption (B) on 
the function Ua, which is weaker boundness Assumption (B) introduced by Schal [[24l. lead to 
the validity of stationary average-cost optimal inequalities and the existence of stationary policies. 
Stronger results hold under Assumption (B). 

Assumption (B). (i) Assumption (G) holds, and (ii) liminf Ma(x) < 00 for all x G X. 
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Assumption (B)(ii) is weaker than the assumption sup^^gjo,!) Ua{x) < oo for all x G X con- 
sidered in Schal [[24|. This assumption and Assumption (G) were combined in Feinberg and 
Lewis [fT4l into the following assumption. 

Assumption (B). (i) Assumption (G) holds, and (ii) sup^g^ i) Ua{x) < oo for all x G X. 

It seems natural to consider the assumption lim sup Ua{x) < oo for all x G X, which is stronger 

than Assumption (B)(ii) and weaker than Assumption (B)(ii). However, as the following lemma 
shows, under Assumption (G) this assumption is equivalent to Assumption (B)(ii). 

Lemma 5.1. Let the cost function c be bounded below and Assumption (G) hold. Then for each 
X G X the following two inequalities are equivalent: 

(i) sup„g[o,i) Uo,{x) < oo, 

( ii) lim sup Ua (x) < oo. 

Proof. Obviously, (i)— )-(ii). Let us prove (ii)— )-(i). Let (ii) hold. Assume that (i) does not hold. 
Since sup„g[o i) Uq(x) = max{sup„g[o Uq,(x), sup„g[„, Ua{x)} for any a* G [0, 1), there ex- 
ists a* G [0, 1) such that supQ,g[g q,,) Ua{x) = oo. 

Since the function Ua remains unchanged, if a finite constant is added to the cost function c, 
we assume without loss of generality that c(x, a) > for all (x, a) G Gt{A). Since c > 0, the 
functions Va{x) and are nonnegative nondecreasing functions in a G [0, 1). Since Va{x) = 
Ua{x) + nia > Mq(x), wc havc sup^gjQ Q,.-) fQ,(x) = oo and therefore Va{x) = oo for all a G 
[a*,l), because of the monotonicity of Va in a. Thus, limsup(l — a)va{x) = oo. However, 

atl 

limsup(l — a)va{x) = lim sup (1 — a){ua{x) +ma) < lim sup (1 — a)ua{x) +mJ < oo, where the 
last inequality follows from (ii) and ( 13.11) . The obtained contradiction completes the proof. □ 

Until the end of this section we assume that Assumption (B) holds. Let us set 

u(x) := liminf Ua{y), x G X, (5.1) 

where lim inf Ua{y) is the least upper bound of the set of all A G M+ such that there exist (3 G [0,1) 

and a neighborhood U (x) of x such that A < mf{ua{y) : a E [(3,l),y E U{x) D X}. 
Also define the following nonnegative functions on X: 

Ur(x) = inf uJx), Ug{x) = Miami Usi^y), /3g[0,1), x G X. (5.2) 

ae[/9,i) y^x 

Observe that all the three defined functions take finite values at x G X. Indeed, 

Ugi^) < Upi^x) < sup inf Mq,(x) = liminf Mq(x) < oo, {3 E [0, 1), x G X, (5.3) 

/3e[o,i)"6[/3-i) °ti 
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where the first two inequalities follow from the definitions of and U g respectively, and the last 
inequality follows from Assumption (B). For x G X 



u[X) = sup 



inf Ua{y) 



sup sup inf inf Ua{y) 
/3e[o,i) R>o y<^BR{x) ae[^,i) 



= sup sup inf Up{y) = sup liminf f/g(?/) = sup < oo, 

/3g[0,l) R>0 y<^Bii{x) /3e[0,l) 2/^^ /3G[0,1) 

where Bji{x) = G X : p{y,x) < R}, the first equality is (15.11) . the second equality follows 
from the properties of infinums, the third and the fifth equalities follow from (15.21) . the fourth 
equality follows from the definition of limsup, and the inequality follows from (15.31) . In view of 
(15. 2|) . the functions Ui3{x) and are nondecreasing in /3. Therefore, in view of (15.41 ). 

u(x) = \imua(x), X G X. (5.5) 

/3tl 

We also set for u from (|5.5|) 

74*(x) := |a G v4(x) : U + > c(x, a) + y a) j> , x G X, (5.6) 

and let A^{x), x G X, be the sets defined in (|3.11h for this function u; A^{x) C A*{x). 

Theorem 5.2. Suppose Assumptions (W*) and (Bj hold. There exist a stationary policy satis- 
fying (1221) with u defined in 0.71) . r/jMi', equalities f l3.4D hold for this policy 0. Furthermore, the 
following statements hold: 

(a) the function n : X — )■ M+, defined in ( I5.il) . zi' lower semi-continuous; 

(b) nonempty sets A*{x), x G X, satisfy the following properties: 

(bi) ?/ze ^rap/z Gr(74*) = {(x, a) : x G X, a G 74*(x)} Z5 a 5ore/ subset ofX x A; 
(b2) /or eac/z x G X ?/ze i'ef A*(x) is compact; 

(c) a stationary policy (j) is optimal for average costs and satisfies ( 13.31) wzY/z u defined in d5.il) . z/ 
0(x) G A*{x)forallx G X; 

(d) f/zere exz^'f^' a stationary policy (p with 0(x) G A^{x) C A*(x)/or a// x G X; 

(e) z/ zn addition, Assumption (Wuj /zoW^, ?/zen the function u, defined in ( 15. il) . Z5 inf-compact. 

Before the proof of Theorem 15.21 we establish some auxiliary facts. 

Lemma 5.3. Under Assumption (Bj, the functions u,u^ : X ^ M+, a G [0, 1), are lower semi- 
continuous on X. If additionally Assumption (W*j holds, the functions Mq, : X — )■ M+, a G [0, 1), 
are lower semi-continuous on X. Under Assumptions (Wuj and (Bj, the functions u, Ua, Wq, : X — )■ 
IR+, a G [0,1), are inf-compact on X. 
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Proof. Since Ua{x) > 0, a G [0, 1) and x G X, the functions u^, a G [0, 1), are lower semi- 
continuous; Feinberg and Lewis []T4l Lemma 3.1]. Since supremum over any set of lower semi- 
continuous functions is a lower semi-continuous function, the function u is lower semi-continuous. 
According to (13.11) . w := lim sup(l — a)ma = inf sup (1 — a)ma < oo. Thus, there exists 

atl "G(O.l) ae[a,l) 

ao G [0, 1) such that 

A' := sup (1 — a)ma < oo. (5.7) 

Let us assume that the function c is bounded below. As explained in the proof of Lemma 15. 1[ 
without loss of generality we can assume that c > 0. Then is a nonnegative, nondecreasing 
function. Thus, (1 — a)ma < (1 — 0)^0,^ < A'/ (1 — ao), ol G [0, uq), and (15.71) implies that 

A* = sup (1 — a)ma < 00. (5.8) 
ae[o,i) 

According to Theorem 14. If i. iv,v), under Assumption (W*), the function Ua{x) = Va{x) — 
is lower semi-continuous, and a stationary policy (pa is a-discount optimal if and only if for all 

X G X 



Vaix) = min {c{x,a) + a / Vaiy)qidy\x, a) } = c{x, (paix)) + a / Vaiy)qidy\x, (paix)). 

a€A(x) 



The first equality in (15.91) is equivalent to 



ajnia + Ua{x) = min 

a&A(x) 



c(x, a) + a / Ua{y)q{dy\x, a) 



(5.9) 



X ex. (5.10) 



Let Assumption (Wu) hold. The function Ua{x) = Va{x) — rria is inf-compact by Theo- 
rem |4Ttvi). Consider an arbitrary A G M+. Since u{x) > Mq,j(x) > Ua2(^)' x e X, for all 
ai, a2 e [0, 1), «! > a2, then P„(A) C T>u^{X) C P„^^(A), a G [0, 1). Since the functions u and 

are lower semi-continuous, the sets P„(A) and ^^^^(A) are closed, a G [0, 1). Therefore, if the 
set T^ugi^) is compact then those sets are also compact and the functions u and u^, a G [0, 1), are 
inf-compact. 

Observe that (15^ and (15.101) imply that u„(x) > vi{x) - A*, x G X, for all a G [0, 1). This 
implies Uo{x) > f i(x) — A*, x G X. Since Uq is the largest lower- semicontinuous function that is 
less than or equal to Uq at all x G X, we have Uq{x) > fi(x) — A*, x G X. Since the function is 
lower semi-continuous, the set Pu^(A) is closed. In addition, P„^(A) C Vy.^ (A + A*), where the set 

(A + A*) is compact. Thus, the set ^'^^(A) is compact, and the functions u and u^, a G [0, 1), 
are inf-compact. □ 

Corollary 5.4. Under Assumption (Bj, for every sequence a„ t 1 '^■^ +00 and for every 

X G X, 

u{x) = lim inf «„„(?/). 

n— ^+00, J/— 
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Proof. Let a„ t 1 as ?i -)■ +00, and x G X. Similar to (|54|) 



liminf sup sup inf inf n (y) = sup sup ini u^ {y) 

y^x n=l,2,... R>0 V&Bnix) m>n n=l,2,... _R>0 yGBflCx) 

= sup liminf u = lim u (x) = u{x), 

n=l,2... '^->-°° 

where the second equality holds because the function u^{y) is nondecreasing in a, the fourth 
equality holds because it is lower semi-continuous, and the last equality follows from (15.51) . □ 

Lemma 5.5. Under Assumptions (W*j and (Bj, the following inequalities hold 



w + u{x) > min 



c{x,a) + / u{y)q{dy\x,a) 



X ex. 



(5.11) 



Proof. Let us fix an arbitrary e* > 0. Since w = limsup(l — a)ma, there exists ^ [0, 1) such 
that 



w + e* > {1 — a)ma, 
Our next goal is to prove the inequality 



a G [ao, 1). 



(5.12) 



w + e* + u{x) > min 

a£A{x) 



c{x,a) + a / u^{y)q{dy\x,a) 



X G X, a G [ttO) !)• (5.13) 



Indeed, by (15.101) and (15.121) for every a, /3 G [«o, 1), such that a < /3, and for every x G X 



+ e* + n/3(x) > (1 — /3)m/3 + ^/^(x) = min 

aeA(x) 



> min 

ae^(a;) 



c(x,a) + (3 up{y)q{dy\x,a) 



> 



c(x,a) + a / Ua{y)q{dy\x,a) 
As right-hand side does not depend on /9 G [a, 1), we have for all x G X and for all a G [ao, 1) 



w + e* + Ua{x) = inf [w + e* + ur{x)] > min 

;Se[a,l) aeA(x) 



c(x, a) + a / Ua{y)q{dy\x, a) 



> 



> min 

a&A{x) 



c(x,a) + a / u^{y)q{dy\x,a) 



min ?7° (x, a). 

a6A(x) --^ ^ 



By Lemma [34l the function x — j- min 77° (x, a) is lower semi-continuous on X. Thus, 

aeA(x) 



liminf min 77" (y, a) > min 77" (x, a) 



X G X, a G [0, 1). 
and, as, by definition (|5.2I) . ita(x) = liminf Ua{y), we finally obtain 

X G X, a G [ao, !)• 



w + £:* + n^(x)> min 77° (a;, a), 



(5.14) 
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As, by (I5]2l), = sup for all x G X, dSJiJ) yields l\5J3- 

ae[ao,l) 

To complete the proof of the lemma, we fix an arbitrary x G X. By Lemma [34l for any 
a G [0,1) there exists Oq, G A{x) such that min 77^ (x, a) = 77° (x, a^). Since > 0, for 

a G [ao, 1) the inequality (15.131) can be continued as 

W + e* + u{x) > Vu^i^^ (^a) > C(X, tta)- (5.15) 

Thus, for all a e [ao,l) 

a„ G Vr,^(^^^.)(w + e* + m(x)) C D^c^.^l"^ + ^* +m(x)) C A{x). 

By Lemma [33l the set Vc(x,-)iuJ + e* + n(x)) is compact. Thus, for every sequence /?.„ t 1 of 
numbers from [oq, 1) there is a subsequence such that the sequence {aa^}n>i converges 

and a* := lim„^ooa„„ G A(x). 

Consider a sequence a„ t 1 such that aa„ — a* for some a* G y4(x). Due to Lemmas 13.51 and 
Corollary [531 

liminfa„, / M«„(y)g((i?/|x, a„) > / u{y)q{dy\x,a^). (5.16) 
Since the function c is lower semi-continuous, (15.151) and (15.161 ) imply 

w + e* + u{x) >limsup?7"" {x,aa^) >c(x, a*)+ / u{y)q{dy\x,a^) > min ?7^(x,a). 

Since IZJ + e* + u(x) > mina(zA{x) vH^^ for ^riy > 0, this is also true when e* = 0. □ 

Proof of Theorem 15721 Lemma [53] contains statements (a) and (e). Since Gr(A*) = {(x,a) G 
Gr(y4) : g{x,a) > 0}, where (7 (x, a) = W + u(x) — c(x, a) — /^^ n(?/)g((iy|x, a) is a Borel function, 
the set Gr{A*) is Borel. The sets A*{x), x G X, are compact in view of Lemma [33l fb). Thus, 
the statement (b) is proved. The Arsenin-Kunugui theorem implies the existence of a stationary 
policy such that 0(x) G A*(x) for all x G X. Statement (e) follows from Lemma \3A\ and the 
Arsenin-Kunugui theorem. The rest follows from Theorem 13. II □ 

Theorem 5.6. Suppose Assumptions (W*) and (Bj hold. Then all the conclusions ofTheorem \5.2\ 
hold and, in addition, for a stationary policy satisfying A3.3\l with u defined in ( I5.il) . 

■u;'^(x) = u; = lim(l — a)t;Q,(x) = lim -— wtr(x), x G X. (5.17) 

Proof. Consider a sequence {«(n)}„>i such that a(n) f 1 as n — )■ +00, and 

lim (1 — a{n))main) = HI- 

71— > + 00 

Define the following nonnegative functions on X: 

JJn{x) = inf Ua(m}ix), M„(x) = \immijjn{y), n>l, x G X, 

m>n y—^x 
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and 

u{x) = supM„(x), X e X. (5.18) 

n>l 

Observe that 

u^ix) < Un{x) < Mm sup Ua{m){x) < OO, X E X, u = 1, 2 , . . . , (5.19) 

where the first two inequalities follow from the definitions of m„ and f/„ respectively, and the 
last inequality follows from Assumption (B). As follows from (|5.18l) and (15.191 ). u{x) < 
lim sup,„^+oo ^a(m) (a^) < +00. According to Feinberg and Lewis [[T4l Lemma 3.1], the func- 
tions u^, n > 1, are lower semi-continuous on X. Therefore, their supremum u is also lower 
semi-continuous. In addition, 

u{x) = sup sup inf inf Ua^{y) = liminf Ua(n){y), x G X, 

n>l _R>0 y^Bnix) rn>n n^+oo,y^x 

where the first equality follows from the definitions of f/„, and u, and the second equality is 
the definition of the lim inf. Since Unix) we have m„(x) t u(x) as n — )• 00 for all x G X. 
We show next that for each x G X 

w_ + u{x) > inf 

a&A{x) 

Indeed let us fix any e* > 0. By the definition of w, there exists a subsequence {a{nk)}k>i ^ 
{a{n)}n>i such that for /c = 1, 2, . . . 

w + e*>{l- a{nk))ma(nk)- 

Let X G X be an arbitrary state. By Theorem 14. II for each k > 1 there exists a„j. G Aa(n^){x) such 
that 

(1 - a(nfe))ma(„fe) + Ma(„^)(x) = c(x, a„J + a{nk) / Ua(n^)iy)qidy\x, a„J. 

Thus, similarly to the proof of Lemma l5.5l we get (15.201) . 

From Lemma [34l and the Arsenin-Kunugui theorem there exists a stationary policy G F such 
that for any x G X 

w + u{x)>c{x,(p{x))+ / u{y)q{dy\x,(j){x)). (5.21) 
Thus, by Schal Il24l Proposition 1.3] described in (13.21 ). for all x G X 

w = w_ = w{x) = w'^{x) = lim(l — a)va{x) = w*. (5.22) 

Let us choose any stationary policy (p such that inequalities (13.21) and (13.31) hold with the func- 
tion u defined in ( 15.11) . Since w = w, according to Theorem 15. 2[ such a stationary policy exists. 
Theorem l3 . 1 [ implies that the stationary policy (p satisfies (13.41) . and Schal [24, Proposition 1.3] (see 
(II2I) ) implies that (15.221) holds with = 0. 



c(x, a) + / u{y)q{dy\x, a) 



(5.20) 
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In addition, (15.221 ) with = implies that for all x G X 

w'^{x) = lim(l — a)ma = lim(l — a){va{x) — Ua{x)) = lim(l — a)va{x), 

where the last equality follows from Assumption (B). Thus, for all a; G X 

w'^{x) = limsup —v^{x) > limsup(l — a)v'^{x) > liminf(l — a)v'^{x) 

> lim(l — a)va{x) = w'^{x), 

where the first inequality follows from the Tauberian theorem (see Sennott ||25l Section A.4] or 
||26l Proposition 5.7]), and the last inequality follows from v^{x) > Va{x) and the existence of 
the limit. So, we have, the existence of lim(l — a)v'^{x). Thus, the Karamata Tauberian theorem 

(Sennott Section A.4] or |l26l Proposition 5.7]) implies w'^{x) = lim„_j.oo ^^'ti^)- □ 

Corollary 5.7. Under Assumptions (W*) and (Bj, the conclusions of Theorems 15. 2\ and \5. 6\ remain 
correct, if the function u is substituted with the function u defined in 0.7^1) . 

Proof As shown in the proof of Theorem 15. 6[ there exists a stationary policy satisfying (15.211) . 
The function u is nonnegative, lower semi-continuous, and takes finite values. Thus, both [|24l 
Proposition 1.3] (see (13.21) ) and Theorem 13.11 can be applied to this function. The proof of 
statements (a)-(d) of Theorem 15.21 uses just these properties of u. Statement (e) follows from 
Lemma [531 whose proof remains unchanged if u is replaced with u. □ 



6 Approximation of Average Cost Optimal Strategies by a-discount Opti- 
mal Strategies 

For a family of sets {Gi{Aa)}a&(id,i), x G X, considered in Theorem 14.11 we pay our attention to 
its upper topological limit 

3a„ ^l,n-)- +00, 3(x„,a„) G Gi{Aa^), n > 1, 



LimGr(y4Q,) < (x, a) G X x A . g^^h that (x, a) = lim (xn,an) 

I. n— )-+oo 

defined, for example, in Zgurovsky et al. [[30l Chapter 1, p. 3]. Let us set 



A'^PP^x) ■= |a G A*{x) : (x, a) G Lim Gr(A„) | , x G X. 

Theorem 6.1. Under Assumptions fW*j and (Bj, the graph Gi^A^pp) is a Borel subset of Gi (A*), 
and for each x G X the set A'^pp^x) is nonempty and compact. Furthermore, there exists a station- 
ary policy (p'^PP such that (j)"-PP{x) G A"-pp{x) for all x E X, and any such policy is average-cost 
optimal. 
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Proof. Let us fix an arbitrary a; G X. From (15.11) (the definition of u), there exists a„}„>i C 
X X (0, 1) such that ?/„ — x, a„ 1 1, Ua„{yn) — > ^(a;), n — )■ +oo. 

Let us choose an arbitrary e* > and 6„ G Aa„{yn), n > 1. Since W = limsup(l — a)ma, 

there exists > 1 such that u{x) + y — ^a^iVn) and W + ^ > (1 ^ ttn)'^a„ for all n > N. 
By definition of the sets y4Q,(-), for each n> N 



(1 - an)ma„ + u^SVn) = c{yn, bn) + «„ / bn) = Vu2„ (^«' bn)- 

Jx 

Thus, for all ?i > 

W + e* + U{X) > rill iVn, bn) > Vul iVn, bn) > Vu" (yn, bn) > c(y„, 6„). 

Therefore, because of Assumption (W*)(ii), the sequence {bn}n>i has a subsequence {bn^^}k>i 
such that bn. a, as k +oo, for some a G A{x). Thus, (x, a) G Lim Gr(AQ). 

Let us prove that {x,a) G Gt{A*). Indeed, as an^Ma^ (') t u{-), k — )■ +oo, then due to 
Lemma [33] and Corollary 15 .41 



liminfa„, / m^.^ (a;)g(c/?/|?/„,, 6„J > / u{x)q{dy\ 



X, a). 



Thus, by Lemma [34l w + e* + > ?7^(x, a), and this is true for any e* > 0. This implies 
w + u{x) > r]l{x,a). This inequality means that {x,a) G Gr(A*) and A''pp{x) ^ 0, since 
(x, a) G LimGr(y4Q,). The set ^"^^(x) is compact because of the closureness of LimGr(AQ,) 

(see Zgurovsky et al. Il30l Chapter 1, p. 3]) and Theorem I5.2r b). The second statement of the 
theorem follows from the Arsenin-Kunugui theorem. □ 

Corollary 6.2. Under Assumptions (W*) and (Bj, for any stationary average-cost optimal policy 
such that (l)°-PP{x) G A''pp{x) for all x G X, for every x G X there exist an{x) t 1 and 
yn{x) X as n ^ +00 such that a„(x) G Aa„{x){yn{x)), n > 1, and (j)"-PP{x) = lim„^+oo an{x). 

Proof. Following Theorem 16.11 consider a stationary average-cost optimal policy 0"*'^ such that 

0"PP(x) G A"PP(x) for all X G X. Furthermore, since A'^pp(x) C A*{x) for all x G X, any 
such a policy is optimal. Let us fix an arbitrary x G X. By definition of A"-^^{x), we have that 
(x^(p°-PP(x)) G LimGrfAa). Then, there exist a„(x) 'I 1, n +oo, and (y„(x), a„(x)) G 

Gr(Aa^), n > 1, such that (x, 0"PP(x)) = lim (?/„(x), a„(x)), i.e. 0"^^(x) = lim a„(x), where 

n— >'+oo n— >+oo 

an{x) G A„„(^)(?/„(x)), n > 1, a„(x) t 1 and yn{x) -> x as n -> +oo. □ 

We remark that, if we replace in (15. 6|) the function u with m defined in (|5.18l) . Theorem 16. II and 
Corollary I6.2l remain correct. 
Let us set 

Xa := {x G X : Va{x) = rria}, a G [0, 1). 
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Under Assumptions (G), uia < oo. If Assumptions (G) and (Wu) hold then Theorem |4?T] im- 
plies that Xa is a compact set for each a G [0, 1). This fact is useful to establish the validity of 
Assumptions (G); see Feinberg and Lewis lfT4l Lemma 5.1] and references therein. 

Theorem 6.3. Let Assumptions (G) and (Wu) hold. Then there exists a compact set IC (^X such 
that Xa C JCfor each a G [0,1). 

Proof. From Assumption (G) and Theorem 14. II we have that for each a G [0, 1) 

^ X, = {x G X : Uaix) = 0} = P„„(0) C VuM ^ l^uJO) C V^^{0). 

In virtue of Lemma [53l we have that Mq : X — )• [0, +oo) is inf-compact function on X. Setting 
/C = Vu (0), we obtain the statement of the theorem. □ 



7 Illustrative Example 

The following example is from Hernandez-Lerma lfT6l . Let 

Xn+l = IXn + (3an + ^n, n = 0,l,..., 

and 

c{x, a) = qx"^ + ra^, 

where (a) q and r are positive constants, 7 and (3 are two constants satisfying 7/? > 0, and (b) ^„ 
are independent and identically distributed (iid) random variables with zero mean, finite variance, 
and continuous density. 

This problem is solved in Hernandez-Lerma [[T6l . where a stationary average-cost optimal pol- 
icy is computed. This problem corresponds to an MDP with X = A = M and with setwise 
continuous transition probabilities. However, if do not have a density, the transition probability 
may not be setwise continuous, but they are weakly continuous; see Feinberg and Lewis lfT3l p. 
48] for detail. If ^„ are arbitrary iid random variables with zero mean and finite variance, this 
problem satisfies Assumption (Wu) and, similarly to the case when there are densities, it satisfies 
Assumption (B). Thus, Theorem 15. 61 can be applied. The optimal policy provided in Hernandez- 
Lerma [[T6ll is also optimal when S,n may not have a density. 



A Proof of Lemma 3.5 



Proof. First, we prove the lemma for uniformly bounded above functions Let hn{s) < K < 00 
for all 77, = 1, 2, ... and all s G S*. For n = 1, 2, . . . and s G 5, define 

Hn{s) = inf hm{s) and hn{s) = liminf if„(s'). 

m>n s'^s 
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The functions hn '■ S ^ [0, +00), n = 1, 2, . . . , are lower semi-continuous; see, for example, 
Feinberg and Lewis [[141 Lemma 3.1]). In addition, for s E S 

knis) I h{s) as Ti — )■ 00. (A.l) 
Weak convergence of {//„}„>! to /i is equivalent to 

liminf > fi{A) for all AeO, (A.2) 

where O is the family of all open subsets of the space S; Billingsley [[S Theorem 2.1]. 
Fix an arbitrary t > 0. By (lA.ll) . if h{s) > t then /;,„(s) > t, n = 1, 2, . . . , and 

{seS : h{s) >t}=[j S^, (A3) 

n>l 

where 

Sn = {seS : h^{s)>t}, n = l,2,..., 
are open sets, since the functions hn-S^ are lower semi-continuous. In addition, 

Sn^Sn+i, n = l,2,.... (A.4) 

Thus, 



G S : h{s) > t}) = lim fi{Sn) < lim liminf 

n— >+oo n—>-+oo m— >+oo 

< limsup liminf fimiSm) = liminf fin{Sn) = liminf /i„({s G S* : > t}), 

where the first equality follows from (IA.4I) and (IA.3I) . the first inequality follows from to (IA.2I) . and 
the second inequality follows from (IA.4I) . 
Thus Serfozo [[271 Lemma 2.1] yields 

/ < liminf / h^[s)^n{ds) < liminf / hn{s)^n{ds), 

where the second inequality is fulfilled due to 

h^{s) < Hn{s) < K{s), s e ^, n = 1, 2, . . . . 

Case 2. Consider a sequence {/i,t}n>i of measurable nonnegative R-valued functions on S. 
For A > set h^{s) := min{/i„(s), A}, s G 5, n = 1, 2, . . . . Since the functions are uniformly 
bounded above, 

h^{s)fi{ds) < liminf / h^{s)^n{ds) < liminf / hn{s)^n{ds), 



where /i (s) = liminf /i„(s'), A > 0, s € S". 

n— >+oo, s'— >s 
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Then, using Fatou's lemma, 

h{s)fi{ds) < liminf / h^(s)fi{ds). 



□ 
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